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Abstract —The problem of modulation classification for a 
mnltiple-antenna (MIMO) system employing orthogonal fre¬ 
quency division mnltiplexing (OFDM) is investigated under 
the assumption of nnknown frequency-selective fading channels 
and slgnal-to-noise ratio (SNR). The classification problem is 
formulated as a Bayesian inference task, and solutions are 
proposed based on Gibbs sampling and mean field variational 
inference. The proposed methods rely on a selection of the 
prior distributions that adopts a latent Dirichlet model for 
the modulation type and on the Bayesian network formalism. 
The Gibbs sampling method converges to the optimal Bayesian 
solution and, using nnmerical results, its accuracy is seen to 
improve for small sample sizes when switching to the mean field 
variational Inference technique after a number of Iterations. The 
speed of convergence is shown to Improve via annealing and 
random restarts. While most of the literatnre on modulation 
classification assume that the channels are flat fading, that the 
number of receive antennas is no less than that of transmit 
antennas, and that a large number of observed data symbols 
are available, the proposed methods perform well under more 
general conditions. Finally, the proposed Bayesian methods are 
demonstrated to Improve over existing non-Bayesian approaches 
based on independent component analysis and on prior Bayesian 
methods based on the ‘superconstellation’ method. 

Index Terms —Bayesian Inference; Modulation classification; 
MIMO-OFDM; Gibbs sampling; Mean field variational infer¬ 
ence; Latent Dirichlet model. 

1. Introduction 

Cognitive radio is a wireless communication technology 
that addresses the inefficiency of the radio resource usage 
via computational intelligence [1], [2]. Cognitive radios have 
both civilian and military applications [3]. A major task in 
such radios is the classification of the modulation format of 
unknown received signals. As the pairing of multiple-antenna 
(MIMO) transmission and orthogonal frequency division mul¬ 
tiplexing (OFDM) data modulation has become central to 
fourth generation (4G) and fifth generation (5G) wireless 
technologies, a need has arisen for the development of new 
classification algorithms capable of handling MIMO-OFDM 
signals. 

Modulation classification methods are generally classi¬ 
fied as inference-based or pattern recognition-based [3]. The 
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inference-based approaches fall into two categories, namely 
Bayesian and non-Bayesian methods [4]. Bayesian methods 
model unknown parameters as random variables with some 
prior distributions, and aim to evaluate the posterior probability 
of the modulation type. Non-Bayesian methods, instead, model 
unknown parameters as nuisance variables that need to be 
estimated before performing modulation classification. With 
pattern recognition-based methods, specific features are ex¬ 
tracted from the received signal and then used to discriminate 
among the candidate modulations. Compared to the pattern 
recognition-based approaches, inference-based methods gen¬ 
erally achieve better classification performance at the cost of 
a higher computational complexity [3]. 

There is ample literature on classification algorithms for 
single-antenna (SISO) systems [3], [5]-[20]. Among them, a 
Bayesian method using Gibbs sampling is proposed in [19]. 
In [20], a systematic Bayesian solution based on the latent 
Dirichlet Bayesian Network (BN) is proposed, which general¬ 
izes and improves upon the work in [19]. A preprocessor for 
modulation classification is developed in [21], whereby the 
timing offset is estimated using grid-based Gibbs sampling'. 
We note that, while most algorithms rely on the assumption 
that the channels are flat fading or additive white Gaussian 
noise channels, the approaches in [19]-[21] are well-suited also 
for frequency selective fading channels. 

Only few publications address modulation classification in 
MIMO systems [22]-[26]. The task of modulation classifica¬ 
tion for MIMO is more challenging than for SISO due to the 
mutual interference between the received signals and to the 
multiplicity of unknown channels. In [22], a non-Bayesian ap¬ 
proach, referred to as independent component analysis (ICA)- 
phase correction (PC), is proposed, where the channel matrix 
required for the calculation of the hypotheses test is estimated 
blindly by ICA [27]. Several related pattern recognition- 
based algorithms are introduced in [23]-[25], where source 
streams are separated by ICA-PC or a constant modulus 
algorithm, and diverse higher-order signal statistics are used 
as discriminating features. Moreover, a pattern recognition- 
based algorithm using spatial and temporal correlation func¬ 
tions as distinctive features is reported in [26] for MIMO 
frequency-selective channels with time-domain transmission. 
The approach is not applicable to modulation classification 
for MIMO-OFDM systems. As for MIMO-OFDM systems, a 
non-Bayesian approach is proposed in [28] based on ICA-PC 
and an assumed invariance of the frequency-domain channels 

’in grid-based Gibbs sampling, a grid of points is first selected within the 
domain of a variable x to be sampled. Then the conditional probability density 
function (normalized or non-normalized) of variable x is computed at each 
selected point, which is used for sampling x. 
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across the coherence bandwidth. It is finally noted that the 
results in this paper were partially presented in [29]. 

Main Contributions: In this work, we develop Bayesian 
modulation classification techniques for MIMO-OFDM sys¬ 
tems operating over frequency-selective fading channels, as¬ 
suming unknown channels and signal-to-noise ratio (SNR). 
Our main contributions are as follows. 

1) A modulation classification technique is proposed based 
on Gibbs sampling for MIMO-OFDM systems. Inspired 
by the latent Dirichlet models in machine learning [30], 
this approach leverages a novel selection of the prior 
distributions for the unknown variables, the modulation 
format and the transmitted symbols. This selection was 
first adopted by some of the authors in [20] for SISO sys¬ 
tems. As compared to SISO systems, a Gibbs sampling 
implementation such as in [20] may have an impracti- 
cally slow convergence due to the high-dimensional and 
multimodal distributions in MIMO systems. The strategy 
of annealing [31]-[33] combined with multiple random 
restarts [34]-[37] is hence proposed here to improve the 
convergence speed. 

2) An alternative Bayesian solution for modulation classi¬ 
fication in MIMO-OFDM systems that leverages mean 
field variational inference [38] is proposed, based on the 
same latent Dirichlet prior selection. 

3) A hybrid approach that switches from Gibbs sampling to 
mean field variational inference is proposed for modula¬ 
tion classification in MIMO-OFDM systems. The hybrid 
approach is motivated by the fact that the Gibbs sampler 
is superior to mean field as a method for exploring the 
global solution space, while the mean field algorithm 
has better convergence speed in the vicinity of a local 
optima [38]-[40]. 

4) Extensive numerical results demonstrate that the pro¬ 
posed Gibbs sampling method converges to an effective 
solution, and its accuracy improves for small sample 
sizes when switching to the mean field variational in¬ 
ference technique after a number of iterations. More¬ 
over, the speed of convergence is seen to be generally 
improved by multiple random restarts and annealing 
[31]-[37]. Overall, while most of the reviewed existing 
modulation classification algorithms for MIMO-OFDM 
systems work under the assumptions that the channels 
are flat fading [22]-[25], that the number of receive 
antennas is no less than the number of transmit antennas 
[22]-[25], and/or that the number of samples is large 
(as for pattern recognition-based methods) [23]-[26], 
the proposed method achieves satisfactory performance 
under more general conditions. 

The rest of the paper is organized as follows. The signal model 
is introduced in Sec. II. In Sec. Ill, we briefly review some 
necessary preliminary concepts, including Bayesian inference 
and BNs, while in Sec. IV, we formulate the modulation 
classification problem under study as a Bayesian inference 
task, and propose solutions based on Gibbs sampling and 
on mean field variational inference. Numerical results of the 
proposed methods are presented in Sec. V. Finally, conclusions 


are drawn in Sec. VI. 

Notation: The superscripts T and H denote matrix or 
vector transpose and Hermitian, respectively. The i-th row 
of the matrix B is denoted as .) and the j-th column 
is denoted as B(.j). Lower case bold letters and upper case 
bold letters are used to denote column vectors and matrices, 
respectively. The notation h\hi, where b = and 

i € {1, n}, denotes the vector composed of all the elements 

of b except bi. We use an angle bracket (•). to represent the 
expectation with respect to the random variables indicated in 
the subscript. For notational simplicity, we do not indicate 
the variables in the subscript when the expectation is taken 
with respect to all the random variables inside the bracket (•). 
The notations t/>(-) and !(•) stand for the digamma function 
[41] and the indicator function, respectively. The cardinality 
of a set B is denoted \B\. We use the same notation, p(-), 
for both probability density functions (pdf) and probability 
mass function (pmf). Moreover, we will identify a pdf or 
pmf by its arguments, e.g., p{X\Y) represents the distribution 
of random variable X given the random variable Y. The 
notations CAf{fi, C) and IQ (a, b) represent the the circularly 
symmetric complex Gaussian distribution with mean vector /r 
and covariance matrix C and the inverse gamma distribution 
with shape parameter a and scale parameter b, respectively. 

II. System Model 

Consider a MIMO-OFDM system operating over a 
frequency-selective fading channel with N subcarriers, Mt 
transmit antennas, Mr receive antennas and a coherence 
period of K OFDM symbols. All frequency-domain symbols 
transmitted during the coherence period are taken from a finite 
constellation A € A, such as M-PSK or M-QAM, where 
A is the (finite) set containing all possible constellations. 
Without loss of generality, A is assumed to be a constellation 
of symbols with average power equal to 1. The number of 
transmit antennas Mt is assumed known. We observe that 
several algorithms have been proposed for the estimation of 
Mt [42], [43]. We focus on the problem of detecting the 
constellation A in the absence of information about the SNR, 
the transmitted symbols and the fading channel coefficients. 

After matched filtering and sampling, assuming 
that time synchronization has been successfully 
performed at least within the error margin afforded 
by the cyclic prefix, the frequency-domain samples 
y[n,fc] = fc],..., /c]]^, received across the 

Mr receive antennas, at the n-th subcarrier of the fc-th OFDM 
frame, are expressed as 

y[n,/c] = H[n]s[n,/c]-f z[n, fc], (1) 

where H[n] is the Mr x Mt frequency-domain channel matrix 
associated with the n-th subcarrier; s[n, k] is the Mt x 1 vector 
composed of the symbols transmitted by the Mt antennas, 
i.e., s[n,fc] = [si[n, fc],..., SMt [n, fc]]^, with SmJnjfc] € A 
being the symbol transmitted by the mj-th transmit antenna 
over the n-th subcarrier of the fc-th OFDM symbol; and 
z[n,/c] = [zi[n,k]T--AMr[nA]Y' ~ is complex 
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white Gaussian noise, which is independent over indices n and 
k. The frequency-domain channel matrix H[n] can be written 

^1,1 N ••• hMt,i[n] 

H[n]= : ; , (2) 

hiM, N • • • fiMtMr- N 

where Jl],denotes the 

iV X 1 frequency-domain channel vector between the mj-th 
transmit antenna and the m^-th receive antenna. Assuming 
that the channel for any pair {mt,mr) has at most L symbol¬ 
spaced taps, we write with the 

L X 1 time-domain channel vector and W the N x L matrix 
composed of the first L columns of the DFT matrix of size 
N. Note that the channel is fixed within the coherence frame 
of K OFDM symbols. 

According to (1) and (2), the NK x 1 received frequency- 
domain signals at the m^-th 

receive antenna is given by 

Mt 

ynir ~ ^ ^ “f 2,7^^, TTlr = (3) 

mt = l 

where ym,[fc] = /c],..., 

Timt = IS an NK X N matrix 

representing the transmitted symbols with an 

N X N diagonal matrix whose (n, n) element is 
Smt[n, fc]; and 2 ,^, = ..., with 

Let us further define the NKMt x 1 vector s = 
[si,..,s;f]^ containing all the transmitted symbols with 
Sfc = [s[l, A:]^,..., s[A^, the LMtMr x 1 vector 

h = [h^,...,h|^ Y' for the time domain channels associated 
with all the transmit-receive antenna pairs, where = 

[h^„j ,...,h^ 77J ]^; and the Af A"Mr X 1 receive signal vector 
y = [yf,...,y^ ]^. The task of modulation classification is 
for the receiver to correctly detect the modulation format A 
given only the received samples y, while being uninformed 
about the symbols s, the channel h and the noise power a^. 
Using (1) and (3), the likelihood function p (y|A, s, h, ct^) of 
the observation is given by 

P (y|^,s,h,cr2) 

= Y[p{yY’A s[n,fc],H[n],cr^) 

n,fc 

^Wpiy^r- s,h™^,cr^) , (4) 

m-r 

with y, 7 rj(s, h„ 7 ^,(T 2 ) cr^l) 

and y[n, A:]|(s[n, A:], H[n], ct^) ^CAf(H[n]s[n, A:], cr^I). 

III. Preliminaries 

As formalized in the next section, in this paper we formulate 
the modulation classification problem as a Bayesian inference 
task. In this section, we review some necessary preliminary 
concepts. Specifically, we start by introducing the general task 
of Bayesian inference in Sec. III-A; we review the definition 
of BN, which is a useful graphical tool to represent knowledge 


about the structure of a joint distribution, in Sec. III-B; 
and, finally, we review approximate solutions to the Bayesian 
inference task, namely, Gibbs sampling in Sec. III-C, and mean 
field variational inference in Sec. III-D. 

A. Bayesian Inference 

Bayesian inference aims to compute the posterior probabil¬ 
ity of the variables of interest given the evidence, where the 
evidence is a subset of the random variables in the model. 
Specifically, given the values of some evidence variables 
©e = 9e, one wishes to estimate the posterior distribution 
of a subset of the unknown variables 0„ = [0i,..., ©g]^. 
We assume here for simplicity of exposition that all variables 
are discrete with finite cardinality. However, the extension 
to continuous variables with pdfs is immediate, as it will 
be argued. The conditional pmf of 0„ given the evidence 
0e = 9e is proportional to the product of a prior distribution 
p{&u) on the unknown variables 0„ and of the likelihood of 
the evidence p(0e|0„): 

p(©„|0e = 9e) OC p{&u)p{®e = 9e\@u)- (5) 

If one is interested in computing the posterior distribution of 
the unknown variable 0j, then a direct approach would be to 
write 

p(0j = %|©e = 0e) = ^ p{®u = 9u\®e = 9e). (6) 

e^\e, 

The inference task (6) is made difficult in practice by the mul¬ 
tidimensional summation over all the values of the variables 
©„\0j. Note also that, if the variables are continuous, the 
operation of summation is replaced by integration and a similar 
discussion applies. Next, we discuss the BN model. 

B. Bayesian Network 

A BN is an acyclic graph that can be used to represent useful 
aspects of the structure of a joint distribution. Each node in the 
graph represents a random variable, while the directed edges 
between the nodes encode the probabilistic influence of one 
variable on another. Node 0^ is defined to be a parent of 
<dj, if an edge from node 0^ to node 0^ exists in the graph. 
According to the BN’s chain rule [38], the influence encoded 
in a BN for a set of variables © = [0i,...,0j]^ can be 
interpreted as the factorization of the joint distribution in the 
form 

.7 

p(0) = J]p(0,|Pae,), (7) 

where we use Pae to denote the set of parent variables 
of variable 0^. Note that (7) encodes the fact that each 
variable <dj is independent of its ancestors in the BN, when 
conditioning on its parent variables Pae^ . In the following, 
we will find it useful to rewrite (7) in a more abstract way as 
[38] 

p{@)=\{f{BA, (8) 

4> 

where the product is taken over all J factor (j){BA = 
^(©jjPaej) with B^ = {0j,PaeJ- 
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C. Gibbs Sampling 

Markov chain Monte Carlo (MCMC) techniques provide 
effective iterative approximate solutions to the Bayesian infer¬ 
ence task (6) that are based on randomization and can obtain 
increasingly accurate posterior distributions as the number 
of iterations increases. The goal of these techniques is to 
generate M random samples from the desired 

posterior distribution p(0„|©e = 0e)- This is done by 
running a Markov chain whose equilibrium distribution is 
p(0„|©e = 0e). As a result, according to the law of large 
numbers, the multidimensional summation (or integration) (6) 
can be approximated by an ensemble average. In particular, 
the marginal distribution of any particular variable 0^ in ©„ 
can be estimated as 

p(0,=0,|0e = 0e)«^ E 1 ( 0 ^= ^ 4 ), (9) 

m=Mo + l 

where is the m-th sample for Qj generated by the Markov 
chain, and Mq denotes the number of samples used as burn-in 
period to reduce the correlations with the initial values [35]. 

Gibbs sampling is a classical MCMC algorithm that defines 
the aforementioned Markov chain by sampling all the variables 
in 0„ one-by-one. Specifically, the algorithm begins with a set 
of arbitrary feasible values for 0^. Then, at step m, a sample 
for a given variable Qj is drawn from the conditional distribu¬ 
tion p(0j |0„\0j, ©e). Whenever a sample is generated for a 
variable, the value of that variable is updated within the vector 
©„. It can be shown that the required conditional distributions 
p(0j|©u\0j, ©e) may be calculated by multiplying all the 
factors in the factorization (8) that contain the variable of 
interest and then normalizing the resulting distribution, i.e., 
we have 

p(0j|©„\0j, ©e) ex (10) 

0: 


Samples are generated, starting with a high temperature and 
ending with a low temperature [31]-[33]. 

D. Mean Field Variational Inference 

Mean field variational inference provides an alternative way 
to approach the Bayesian inference problem of calculating 
pfdj = 0j\&e = de). The key idea of this method is 
that of searching for a distribution q{&u) that is closest 
to the desired posterior distribution p{&u\9f.), in terms of 
the Kullback-Leibler (KL) divergence KL ((7(©„)||p(©„|0e))5 
within the class Q of distributions that factorize as the product 
of marginals, i.e., q{®u) = 11^=1 9(®i) [38, Ch. 11]. The 
corresponding variational problem is given as 

minimize KL (9(©„)||p(©„|0e)) (11) 

s.t. q e Q. 

By imposing the necessary optimality conditions for problem 

(11) , one can prove that the mean field approximation <?(©„) 
is locally optimal only if the proportionality [38] 

(?(0,)cxexp| ^ (12) 

holds for all j = 1,.., G, where the expectation in (12) is taken 
with respect to the distribution q{Bf\Qj) = q(Qu)/q{G)j)- 
The idea of the mean field variational inference is to solve 

(12) by means of fixed-point iterations (see [38] for details). 
It can be shown that each iteration of (12) results in a better 
approximation q to the target distribution p(©„|0e), hence 
guaranteeing convergence to a local optimum of problem (11) 
[38, Sec. 11.5.1]. Once an approximating distribution q{Qu) is 
obtained, an approximation of the desired posterior p(©„|0e) 
can be obtained as p(©„|0e) « q(&u), and the marginal pos¬ 
terior distribution may be approximated as p(0jj9e) ss q{0j)- 


where the right-hand side of (10) is the product of the factors 
in (8) that involve the variable Qj. 

Remark 1: In order for Markov chain Monte Carlo algo¬ 
rithms to converge to a unique equilibrium distribution, the 
associated Markov chain needs to be irreducible and aperiodic 
[38, Ch. 12]. In the context of the Gibbs sampling, a sufficient 
condition for asymptotic correctness of Gibbs sampling is that 
the distributions p{Qj\&u\Qj,®e) are strictly positive in 
their domains for all j. 

Remark 2: When applying Gibbs sampling to practical 
problems, in particular those with high-dimensional and mul¬ 
timodal posterior distribution p(0j|©„\0j, ©e), slow con¬ 
vergence may be encountered due to the local nature of the 
updates. One approach to address this issue is to run Gibbs 
sampling with multiple random restarts that are initialized with 
different feasible solutions [34]-[37]. Moreover, within each 
run, simulated annealing may be used to avoid low-probability 
“traps.” Accordingly, the prior probability, or the likelihood, 
may be parametrized by a temperature parameter T, such that 
a large temperature implies a lower reliance on the evidence 
aimed at exploring more thoroughly the range of the variables. 


IV. Bayesian Inference for Modulation 
Classification 

In this section, we solve the problem of detecting the 
modulation A & A hy adopting a Bayesian inference for¬ 
mulation. First, in Sec. IV-A, we discuss the problem of 
selecting a proper prior distribution, and argue that a latent 
Dirichlet model inspired by [30] and first used for modulation 
classification in [20], provides an effective choice. Then, based 
on this prior model, we develop two solutions, one based on 
Gibbs sampling, in Sec. IV-B, and the other based on mean 
field variational inference, in Sec. IV-C. 


A. Latent Dirichlet Bayesian Network 

According to (5), the joint distribution of the unknown 
variables (A,s, h, cr^) may be expressed 


P {a, s, h, y) oc p (^y A, s, h, p (A, s, h, a^) , (13) 


where the likelihood function p {y\A, s, h, cr^) is given in (4), 
and the term p (A, s, h, cr^) stands for the prior information 
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Figure 1. BN for the modulation classification scheme based on the 
factorization (13). The nodes inside the rectangle are repeated NK times. 


on the unknown quantities. The prior is assumed to factorize 
as 

p(A,s,h,a^) =p(A) {n P {^mt 1-^) ^ ■ 

n,k,mt 

• n ■ (14) 

rrit 

1) Conventional Prior: A natural choice for the prior 
distribution of the unknown variables (A,s, h, tr^) is given 
by A ^ uniform (A), Sm,t[n,k]\A ~ uniform(A), hmt,mr 
CA/'(0,aI), and ^ XQ{ao,(3o) with fixed parameters 
(a, ao,/3o) [20]. Recall that the inverse Gamma distribution 
is the conjugate prior for the Gaussian likelihood at hand, 
and that uninformative priors can be obtained by selecting 
sufficiently large a and /ip and sufficiently small ag [35]. The 
factorization (14) implies that, under the prior information, the 
variables A, and cP' are independent of each other, and 

that the transmitted symbols Smt [^j k] are independent of all 
the other variables in (14) when conditioned on the modulation 
A. The BN Qi that encodes the factorization given by (13), 
along with (4) and (14), is shown in Fig. 1. 

The Bayesian inference task for modulation classification of 
MIMO-OFDM is to compute the posterior probability of the 
modulation A conditioned on the received signal y, namely 

F(^|y)=X! /p (As,h,cr2|y) dhdcr^ 

S 

Following the discussion in Sec. Ill, the calculation in (15) 
is intractable because of the multidimensional summation 
and integration. Gibbs sampling (Sec. III-C) and mean field 
variational inference (Sec. III-D) offer feasible alternatives. 
However, the prior distribution (14) does not satisfy the 
sufficient condition mentioned in Remark 1, since some of 
the conditional distributions required for Gibbs sampling are 
not strictly positive in their domains. In particular, the required 
conditional distribution for modulation A can be expressed as 

p(A = a|s,h,cr^,y) (xp(A) p [n, fc]|A). (16) 

n,k,mt 

The conditional distribution term p(smt fc]|A = a) in 
(16) is zero for all values of Smt[n,k] not belonging to the 


constellation a, i.e., p{smt[n‘,k]\a) = 0 for Smt[i^,k] ^ a. 
Therefore, the conditional distribution p(A = a|s,h,(T^,y) 
is equal to zero if the transmitted symbols s do not belong 
to a. As a result, the Gibbs sampler may fail to converge to 
the posterior distribution (see, e.g., [21]). In order to alleviate 
the problem outlined above, we propose to adopt a prior 
distribution encoded on a latent Dirichlet BN Q 2 shown in 
Fig. 2. 

2) Latent Dirichlet BN: Next, we introduce the Gibbs 
sampler based on on a latent Dirichlet BN Q 2 in details. 
Accordingly, each transmitted symbol Smt [rr, k] is distributed 
as a random mixture of uniform distributions on the different 
constellations in the set A. Specifically, a random vector p^i 
of length |A| is introduced to represent the mixture weights, 
with Pa (a) being the probability that each symbol Sm,t[n,k] 
belongs to the constellation a C A. Given the mixture weights 
Pa, the transmitted symbols Smt k] ^6 mutually indepen¬ 
dent and distributed according to a mixture of uniform dis¬ 
tributions, i.e., p(s„Jn,fc]|pA) = 'Ea:s^,ln,k]&aPA (o) / |a|. 
The Dirichlet distribution is selected as the prior distribution 
of Pa in order to simplify the development of the proposed 
solutions, as shown in the following subsections. In particular, 
given a set of nonnegative parameters 7 = [ 71 , • • • , 7 |a|]^, we 
have Pa ~ Dirichlet ( 7 ) [38]. We recall that the parameter 
7 o has an intuitive interpretation as it represents the number 
of symbols in constellation a C A observed during some 
preliminary measurements. 



Figure 2. BN Q 2 for the modulation classification scheme based on the 
Dirichlet latent vaiiable p^. The nodes inside the rectangle are repeated NK 
times. 


The BN Q 2 encodes a factorization of the conditional 
distribution p(pA, s, h, cr^ |y) 


p(pA,s,h, cr^ly) 

exp (y PA,s,h, cr^) p (pa) i ]q p(smJn,fc]|pA) 


n,k,mt 


n P(hmt.mjp(cr^) , (17) 

rrit iTTlr 


where we have pa ~ Dirichlet ( 7 ) with a set of nonnegative 
parameters 7 = [ 71 , • • • , 7|^|]^ [38], p fc]|pA) = 

Sa- s„ [n fc]Ga (®) / l®l’ ™el the Other distributions are as 
in (4) and (14). The Bayesian inference task for modulation 
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classification is to compute the posterior probability of the 
mixture weight vector pA conditional on the received signal 
y, namely 

p{PA\y)=Y^ f p(pA\s,h,a^ dtida"^, (18) 

s 

and then to estimate A as the value that maximize the a 
posteriori mean of p^i: 

i = argmax(p^(a) (19) 

The proposed approach guarantees that all the conditional 
distributions needed for Gibbs sampling based on the BN Q 2 
are non-zero, and therefore the aforementioned convergence 
problem for the inference based on BN Qi is avoided. 


B. Modulation Classification via Gibbs Sampling 

In this subsection, we elaborate on Gibbs sampling for 
modulation classification. As explained in Sec. III-C, in or¬ 
der to sample from the joint posterior distribution (17), the 
distribution of each variable conditioned on all other variables 
is needed. According to (10), we have (for derivations see 
Appendix II): 

1) The conditional distribution of the vector p. 4 , given s, 
h, cr^ and y can be expressed as 

p [pa s, h, y) ~ Dirichlet (7 + c), (20) 

where c = [ci,--- ,C|^|]^, and Ca is the number of 
samples of transmitted symbols in constellation a & A', 

2) The distribution of transmitted symbols con¬ 

ditioned on P. 4 , s\smt [tt., fc], h, cA and y, is given by 

P [tt, fc] PA,S\S 

nit [n, A:],h, cr^.y^ 

o^p{smAn,k]\pA)p(y[n,k] s[n, A:], H[n], , (21) 

where we recall that when a new sample is generated 
for Srrit [tij k] , the new value is updated and used in 
computing subsequent samples in s; 

3) The required distribution for channel vector is 

given by 

h-rnt ,nir (PA,s,h\h^,,^^,a^y) 

(^mt ,mr-i ^mt ■,m-r ), (22) 

where we have 

(S™,.™.) (D™,W), (23) 

and 



4) The conditional distribution for conditioned on p. 4 , 
s, h, and y, is given by 

(T^|pAS,h,y~Zf;(a,/3), (25) 


where a = ao + NKMr and 3 = Bq -\- 

2 

Em, y-mr- - Emj ■ Note that (20) is a 

consequence of the fact that Dirichlet distribution is 
the conjugate prior of the categorical likelihood [38]; 
(22) can be derived by following from standard MMSE 
channel estimation results [44]; and (25) follows the 
fact that the inverse Gamma distribution is the conjugate 
prior for the Gaussian distribution [45]. 

We summarize the proposed Gibbs sampler for modulation 
classification in Algorithm 1. 

Algorithm 1 Gibbs Sampling 

• Initialize 0^^ = {p|^\ ( 7 ^*'°^} from prior dis¬ 

tributions as discussed in Sec. IV-A 

• for each iteration m = 1 : M 

- given draw a sample 

p^^ from p(pA|s,h, 0 - 2 , y) in ( 20 ); 

- given 

draw a sample [n, fc] from 

p(SmJn,fc]|pA,s\s 

mt [^5 k],h,a‘^,y) in ( 21 ); 

- given and 

the current sample values for , draw a sample 

from p(hmj,m,|(PA,s, h\hmj.m^,cr 2 ,y)) in ( 22 ); 

- given {p^\ draw sample from 

p(cr 2 |p. 4 S,h,y) in (25); 

• end for 

Remark 3: In [19], an alternative Gibbs sampling approach 
based on a “superconstellation” is proposed to solve the 
convergence problem at hand for modulation classification in 
SISO. The Gibbs sampling scheme in [19] can be viewed 
as an approximation of the approach based on the latent 
Dirichlet BN obtained by setting the prior distribution p .4 ~ 
Dirichlet ( 7 ) such that 7 = 0 and by setting the sample value 
of p^i to be equal to the mean of the conditional distribution 
p(PA|s,h,cr 2 ^y)^ j P^) ^ c/X^aeACo [ 20 ], where we 
recall that Ca is the number of symbols that belong to con¬ 
stellation a G A. The performance of the “superconstellation” 
approach extended to MIMO OFDM is discussed in Sec. V. 

Remark 4: When the SNR is high, the convergence speed 
is severely limited by the close-to-zero probabilities 
in the conditional distribution (21). Specifically, as 
the Gibbs sampling proceeds with its iterations, the 
samples of tend to be small, making the relationship 
between y[n, fc] and SmA^Tk] almost deterministic. In 
particular, the term p(y[n, fc] |s[n, fc], H[n], cr^) in (21) 
satisfies p(y[n, fc] [n, fc], [n], (cr^)^™)) ~ 1 

for the selected sample value [n, k] , and 

p(y[n, fc] |s('"^[n, fc], [n], (cr^)^™^) ~ 0 for all other 

possible values for s[n, A:]. As a result, transition between 
states with different values in the Markov chain occurs with a 
very low probability leading to extremely slow convergence. 
As discussed in Remark 2, the strategy of Gibbs sampling 
with multiple random restarts and annealing may be adopted 
to address this issue. For simulated annealing, we substitute 
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the conditional distribution (25) for with an iteration 
dependent prior given as [33] 

PAS,'h,y {a', 13), (26) 

where we have a'{m) = (1 — (1 — po) exp{—m/mo))a, 
with m denoting the current iteration index, po = 0.1 and 
mo = 0.3M, where M is the total number of iterations. 
For multiple restarts, we propose to use the entropy of the 
pmf (pyi) , I ,, estimated in a run as the metric, to choose 
among the Nmn runs of Gibbs sampling which one should be 
used in (19). Specifically, the run with the minimum entropy 
estimate (PA)p(p^ly) is selected. The rationale of this choice 
is that an estimate (PA)p(-p^jy) with a low entropy identifies 
a specific modulation type with a smaller uncertainty than an 
estimate (PA)p(p^ly) with higher entropy (i.e., closer to a 
uniform distribution). 

C. Modulation classification via Mean Field Variational In¬ 
ference 

Following the discussion in Sec. III-D, the goal of 
mean field variational inference applied to the prob¬ 
lem at hand is that of searching for a distribution 
q{pA, s, h, cr^) that is closest to the desired posterior distribu¬ 
tion p(pa, s, h, CT^|y), in terms of the Kullback-Leibler (KL) 
divergence KL (( 7 (pa, s, h, a‘^)\\p{pA, s, h, CT^|y)), within the 
class Q of distributions that factorize as 

q{pA, s, h, cr^) = q (p^) q (s) q (h) q (ct^) , (27) 

where q{s) = nf=i Il^Li H™* fcj), and g (h) = 

. Next, we present the fixed-point equations 
for mean field variational inference. These update equations 
may be derived by applying (12) to the joint pdf (17). We 
recall that ( 12 ) requires taking expectations of the relevant 
variables with respect to updated distribution q. If all distri¬ 
butions q (pa), q (h) and q (ct^) are initialized by choosing 
from the conjugate exponential prior family [39], [46] in a 
way being consistent with the priors in (17), namely q (p^i) 
being a Dirichlet distribution, q (h) being a circularly complex 
Gaussian distribution, and q (ct^) being an inverse Gamma dis¬ 
tribution, these fixed point update equations can be calculated, 
using a similar approach as in the Appendix I, as follows. 

1) The fixed point update equation for the mixture weight 
vector p^ can be expressed as 

q {p a) = Dirichlet (7 + g), (28) 

where g = [gi, ■ ■ ■ , 9\A\]’^ and ga = 

X]n,fe,mt [n,k]ea ^i^rnt [^^7 ^])- 

2) The update equation for the transmitted symbol 
Smt [n, k] is given by 

g(s™Jn,fc]) oc exp ( Inp [n, fcjlp^) 

( Inp (y[n, fc]|s[n, k],ll[n],a^) 2 ) 

(29) 

where the detailed expression of (29) is shown in Ap¬ 
pendix III. 


3) The equation for the channel vector is given by 

q O^rntnir) ~ ( 20 ) 

where 

= (31) 

= ^ (32) 

= (Dm* ) Wh„/, (33) 

and (D« is a diagonal matrix whose(n, n) ele¬ 

ment is equal to Yl!k=\ {\smt[n,k]\^). 

4) The fixed point update equation for the noise variance 
can be expressed as 

q{a^) =igia,l3), (34) 

where a = ctg -j- NKMr, and /? = /?□ -f with 

Pm,. = |ymj^ - 2 real y^^ ^ ^mt,m,. + 

mt -* 

rrit ,mr i (35) 

mt mt 

vp —fy n )wi] -I- 

D )Wh 1361 

and 

^mt,mr = ^mt,mP^ (Dmt) (D„'^ Wh^', 

(37) 

where we use tr[-] to denote the trace of a matrix. 

We summarize the proposed iterative mean field variational 
inference for modulation classification in Algorithm 2. 

Algorithm 2 Mean Field Variational Inference 

• Initialize q (pa), q (s), q (h) and q (^a'^') 

• for each iteration m = 1 : M 

- given the most updated distribution q (s), q (h) and 
q (cr^), update the distribution q{pA) using (28); 

- given the most updated distribution q (pyi)7 
q{s\smt['n,k]), g (h) and q{<J^), update the 
distribution q{smt[n,k\) using (29); 

- given the most updated distribution g (pa )7 q (s), 
g(h\hmt,m,.) and q(^a^), update the distribution 

using (30); 

- given the most updated distribution g (pyi)7 q (s), 
and g (h), update the distribution g (ct^) according 
to (34); 

• end for 

Remark 5: While Gibbs sampler is generally known to be 
superior to mean field as a method for exploring the global 
solution space, the mean field algorithm is known to have 
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Table I 

Computational Complexity for Gibbs sampling 


Unknowns 

Computational complexity for sampling 
the variable 

PA 

0{NK} 

s 

0{M'^MrNKMA} 

h 

0{MtMr[N K(N L + L'^ + MtN) + L^]} 

a" 

0{MtMrN'^K} 

Total 

0{NitMtMr[N K(N L + L'^ + MtN + 
MtMA) + L^]} 


better convergence speed in the vicinity of a local optima [38], 
[39], [40]. Following [39], we then propose a hybrid strategy 
strategy that switches from Gibbs sampling to mean field 
variational inference to “zoom in” on the local minimum of the 
optimization problem (11). Additional discussion on this point 
can be found in the next section. A break-down of the contribu¬ 
tion to the computational complexity of each iteration for the 
proposed Gibbs sampler are shown in Table I. In summary, 
the Gibbs sampler requires 0{NitMtMr[NK{NL -\- LF' + 
MtN + MtMjC) basic arithmetic operations, where Na 

is the total number of iterations and M _4 is the total number of 
states of all possible constellations, i.e., M 4 = SaeA 
each iteration, the number of the basic arithmetic operations 
required for mean field variational inference is of the same 
order of magnitude as that for Gibbs sampling. This similarity 
of the computational complexity of Gibbs sampling and mean 
field variational inference is also reported in [39], [40] and 
[47]. 

V. Numerical Results and Discussions 

In this section, we evaluate the performance of the pro¬ 
posed modulation classification schemes for the detection of 
three possible modulation formats within a MIMO-OFDM 
system. The performance criterion is the probability of correct 
classification assuming that all the modulations are equally 
likely. Normalized Rayleigh fading channels are assumed 
such that 11^] = 1. We define the average SNR 

as 10log(Mt/(T^). Unless stated otherwise, the following 
conditions are assumed; i) ^=(QPSK, 8 -PSK, 16QAM}; 
ii) Mt = Mr = 2 antennas; Hi) K = 2 OFDM sym¬ 
bols; and iv) L = 5 taps with relative powers given by 
[OdB, -4.2 dB, -11.5 dB, -17.6dB, -21.5dB]. 

A. Performance of Gibbs Sampling 

1) Gibbs Sampling with Restarts and Annealing: We first 
investigate the performance of the proposed Gibbs sampling 
algorithms with or without multiple random restarts and sim¬ 
ulated annealing within each run (see Remark 4). The number 
of runs in each process of Gibbs sampling with multiple 
random restarts is selected to be Nmn = 5, and the number 
of iterations in each run is M = 2000, where Mq = 0.85M 
initial samples are used as burn-in period.^ Note here that 
the total number of iterations required for Gibbs sampling 
is Nit = NrunM. All elements of the vector parameter 7 

^The samples in the burn-in period are not used to evaluate the average in 

(19). 


of the prior distribution pA ~ Dirichlet ( 7 ) are selected 
to be equal to a parameter 7 . As also reported in [20], it 
may be shown, via numerical results, that the modulation 
classification performance is not sensitive to the choice of 
parameter 7 as long as the value of the virtual observation 7 
(see Sec. IV-A2) is not very small (7 < 1). For the numerical 
experiments in this paper, we select the values of 7 to be equal 
to 8 % of the total number of symbols, e.g., in this example 
7 = [OMNKMt] = 40. 

In Fig. 3, the probabilities of correct classification for 
regular Gibbs sampling, Gibbs sampling with multiple random 
restarts, Gibbs sampling with annealing and Gibbs sampling 
with both multiple random restarts and annealing are plot¬ 
ted as a function of SNR. We also show for reference the 
performance of the ’superconstellation’ scheme, with both 
multiple random restarts and annealing, of [19] (see Remark 
3) extended to MIMO OFDM systems. From Fig. 3, it can 
be seen that the proposed strategy outperforms the approach 
in [19]. Moreover, both strategies of multiple random restarts 
and annealing improve the success rate, and that the best 
performance is achieved by Gibbs sampling with both random 
restarts and annealing. As discussed in Remark 4, annealing 
is seen to be especially effective in the high-SNR regime. 



Figure 3. Probability of correct classification using Gibbs sampling versus 
SNR (N = 128, Mt = Mr = 2, K = 2 and L = 5). 

2) Performance Under Incorrect Channel Length Esti¬ 
mates: Next, we study the effect of incorrect channel length 
estimates. The relative powers of the considered channel taps 
are [0 dB, —2 dB, —2.5 dB]. We considered the performance of 
the proposed scheme under overestimated, correctly estimated, 
or underestimated channel lengths. Specifically, the channel 
length estimates take three possible values, namely L = 1 , 
L = 3 or L = 5, while L = 3. The same values for 
the parameters of Gibbs sampler are used as in Sec. V-Al. 
Fig. 4 shows the probabilities of correct classification for 
Gibbs sampling with both random restarts and annealing 
versus SNR. It is observed that there is a minor performance 
degradation with an overestimated channel length. Here, for 
this example, the degradation caused by the overfitting when a 
more complex model with L = 5 is used is minor. In contrast, 
a more severe performance degradation is observed for the 
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Table II 

Confusion matrix of the proposed Gibbs sampler eor three 
MODULATION FORMATS AT 5 DB 



Estimated 

QPSK 

8-PSK 

16-QAM 

Actual 

QPSK 

96.3% 

2.7% 

1.0% 

8-PSK 

9.2% 

88.1% 

2.7% 

16-QAM 

15.4% 

18.6% 

66.0% 


case of the underestimated channel length. This signihcant 
degradation is caused by the bias introduced by the simpler 
model with L = 1. We also carried out the experiments for 
L = 5 taps with relative powers of [0 dB, —2 dB, —2.5 dB, 
—3.1 dB,—4.2 dB]. The performance is very similar to that for 
the case of L = 3 shown in Fig. 4 and is hence not reported 
here. 



Figure 4. Probability of connect classification using Gibbs sampling versus 
SNR with different channel length estimates L (N = 128, Mt = Mr = 2, 
K = 2 and L = 3). 

3) Performance with a Larger A: To study the effect 
of modulation pool A with a larger size, besides the three 
modulations considered above, we added 16-PSK into the 
modulation pool, i.e., yl={QPSK, 8-PSK, 16QAM, 16-PSK}. 
The relative powers of the multi-path components and the 
parameters of the Gibbs sampler take the same value as in Sec. 
V-A2. The performance of the proposed Gibbs sampler for the 
cases of three and four modulation formats is shown in Fig. 
4. As expected, some performance degradation is observed 
for a larger set of possible modulation schemes. To gain more 
insight into the classiher behavior, the confusion matrices for 
both cases of three and four modulation schemes for SNR 
of 5 dB are shown in Tables II and III, respectively. The 
confusion matrices show the probabilities of deciding for a 
given modulation format when another format is the correct 
one. For instance, the element associated with row ‘8-PSK’ 
and column ‘QPSK’ in Table II indicates that, when the actual 
modulation scheme is 8-PSK, the probability that the estimated 
modulation is QPSK is 9.2%. Comparing Table II with Table 
III, it can be seen that the decreased accuracy is mainly caused 
by the confusion between the two most similar modulation 
formats, namely 8-PSK and 16-PSK. 



Figure 5. Probability of correct classification using Gibbs sampling versus 
SNR with different sets A of possible modulation schemes {N = 128, Mt = 
Mr = 2, K = 2 and L = 3). 

Table III 

Confusion matrix of the proposed Gibbs sampler for four 
MODULATION FORMATS AT 5 DB 



Estimated 

QPSK 

8-PSK 

16-QAM 

16-PSK 

Actual 

QPSK 

96.8% 

1.8% 

0.6% 

0.8% 

8-PSK 

9.4% 

43.4% 

3.6% 

43.6% 

16-QAM 

18.8% 

11.6% 

60.6% 

9.0% 

16-PSK 

11.8% 

37.8% 

3.8% 

46.6% 


B. Performance of Mean Field Variational Inference 

Here, we study the performance of a hybrid scheme, in¬ 
spired by [39], that starts with a Gibbs sampler in order 
to perform a global search in the parameter space and then 
switches to mean held variational inference to speed up 
the convergence to a nearby local optima. In the switching 
iteration, all the needed marginal distributions for mean held 
variational inference are initialized as the conditional distribu¬ 
tions for Gibbs sampling in the previous iteration, namely the 
marginal distributions are initialized as follows, 

(38) 

[n, k]) 

s\Smt h, Cr^,y), (39) 

^p(nis-i) |Pa, s, h\h„j,y), (40) 

and 

9(-»)(a2)=p(™-i)(a2|p^s,h,y), (41) 

where denotes the index of the switching iteration. 

To demonstrate the effectiveness of this approach, we con¬ 
sider modulation classihcation using received samples within 
one OFDM frame (K = 1) over Rayleigh fading channels with 
two taps {L = 2) and with the relative powers [0 dB, —4.2 dB]. 
The SNR is 10 dB and the DFT size is V = 64 or 128. After 
the hrst eight iterations of regular Gibbs sampling (without 
random restarts and annealing), we switch to the mean held 
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variational inference. The probability of correct classibcation 
of both regular Gibbs sampling and the hybrid approach versus 
the number of iterations M with Mq = 0.9M burn-in samples 
is shown in Fig. 6. It can be observed that the hybrid approach 
outperforms Gibbs sampling especially when the number of 
iterations is small. 



Figure 6. Probability of correct classification using Gibbs sampling and mean 
field variational inference versus M (Mt = Mr = 2, K = 1, L = 2 and 
SNR=10dB). 


C. Comparison of Gibbs Sampling and ICA-PC [28] 

Here, we compare the classification results achieved by the 
proposed Gibbs sampling scheme with the ICA-PC approach 
of [28], which extends to MIMO-OFDM the techniques stud¬ 
ied in [22]. The approach in [28] exploits the invariance of the 
frequency-domain channels across the coherence bandwidth to 
perform classification. Specihcally, the subcarriers are grouped 
in sets of D adjacent subcarriers whose frequency-domain 
channel matrices are assumed to be identical. Let us denote 
the frequency-domain channel matrix and the received samples 
for the i-th group by and respectively, i = 1, ...,N/D. 
To compute the likelihood function p{yi\A = a,Hi) of 
the received samples over the subcarriers within group i, 
an estimate Hi of the channel matrix Hi is first obtained 
using ICA-PC, and then the likelihood p{yi\A = a,Hi) is 
approximated as p{yi\A = a, Hi). Accordingly, the likelihood 
function p(y|A = a,H) of all the received samples y is 
approximated as p{y\A = a, H) = Y[iP{yi\A = a, Hi), 
where H = . The detected modulation is selected 

as A = argmaxag^p(y|A = a, H). 

In Fig. 7, we plot the performance of the approach based 
on ICA-PC with different values of D and Gibbs sampling 
with random restarts and annealing. The number of runs are 
Nrun — 5, and the annealing schedule is (26). It can be 
seen from Fig. 7 that Gibbs sampling significantly outperforms 
ICA-PC. In this regard, note that, with D = 4, the accuracy in 
ICA-PC is poor due to the insufficient number of observed data 
samples; while with D = 16, the model mismatch problem 
becomes more severe due to the assumption of equal channel 
matrices in each subcarrier group. 


To further emphasize the advantages of the proposed 
Bayesian techniques, in Fig. 7, we consider the case in which 
there are fewer receive antennas than transmit antennas by 
setting Mr = 1 and all other parameters as above. While 
ICA-PC cannot be applied due to the ill-posedness of the 
ICA problem, it can be observed that a success rate of 71% 
can be attained by the proposed Gibbs scheme at 15 dB. It 
should be mentioned however, that the complexity of ICA-PC 
of the order of 0{MtMrKNM^*} [24] is smaller than that 
of Gibbs sampling (see Remark 5), where denotes the 
maximum number of states of a constellation among all the 
possible modulation types, i.e., M„ = maxag^{|a|}. 



Figure 7. Probability of correct classification using Gibbs sampling with 
multiple random restarts and annealing and approach of [28] based on ICA- 
PC versus SNR (N = 128, K = 2 and L = 5). 


D. Performance for Coded OFDM Signals 

To investigate the performance of the proposed Gibbs 
sampling approach in the presence of a model mismatch, 
we study here the case of coded OFDM. Specifically, we 
assume that the information bits are hrst encoded using a 
convolutional code, and then modulated. We apply Gibbs 
sampling with multiple random restarts and annealing with 
all relevant parameters being the same as in Sec. V-A. The 
code rates are 1/3, 1/2 and 2/3, respectively. In Fig. 8, the 
probability of correct classification is shown versus SNR. It 
can be seen from the figure that the success rate decreases only 
slightly (up to 6%) as the code rate decreases. The degradation 
is caused by the fact that the coded transmitted symbols are 
not mutually independent due to the convolutional coding, 
and hence their prior distribution is mismatched with respect 
to the model (17). As the SNR increases, the performance 
degradation for coded signals become minor, because, in this 
regime, Gibbs sampling relies more on the observed samples 
than on the priors. We also studied the performance of the 
proposed Gibbs sampling for OFDM systems with space- 
frequency coded symbols (SF-OFDM) [48]. Compared to the 
uncoded case, minor performance degradation (2% at 15 dB 
and up to 8% at 0 dB) is observed also due to the presence of 
a model mismatch. 
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Figure 8. Probability of correct classification using Gibbs sampling with 
multiple random restarts and annealing versus SNR for convolution coded 
OFDM signals with different code rate (N = 128, Mt = Mr = 2, K = 2 
and L = 5). 


VI. Conclusions 


In this paper, we have proposed two Bayesian modulation 
classification schemes for MIMO-OFDM systems based on a 
selection of the prior distributions that adopts a latent Dirichlet 
model and on the Bayesian network formalism. The proposed 
Gibbs sampling method converges to an effective solution 
and, using numerical results, its accuracy is demonstrated 
to improve for small sample sizes when switching to the 
mean field variational inference technique after a number of 
iterations. The speed of convergence is shown to improve via 
multiple random restarts and annealing. The techniques are 
seen to overcome the performance limitation of state-of-the-art 
non-Bayesian schemes based on ICA and Bayesian schemes 
based on “superconstellation” methods. In fact, while most 
of the mentioned existing modulation classification algorithms 
rely on the assumptions that the channels are flat fading, that 
the number of receive antennas is no less than the number 
of transmit antenna, and/or that a large amount of samples 
are available (as for pattern recognition-based methods), the 
proposed schemes achieve satisfactory performance under 
more general conditions. For example, with Mt = 2 transmit 
antennas and under frequency selective fading channels with 
L = 5 taps, a correct classification rate of above 97% may be 
attained with Mr = 2 receive antennas and with 256 received 
samples at each antenna; and a success rate of above 70% may 
be achieved with Mr = 1 receive antenna and 256 received 
samples at the antenna. Moreover, the proposed Gibbs sampler 
presents a graceful degradation in the presence of a model mis¬ 
match caused by channel coding, e.g., a decrease in the success 
rate by 6 % with a code rate of 1/3. Future works include 
devising a Gibbs sampling scheme that accounts for the effects 
of the timing and carrier frequency offsets for MIMO systems 
following, e.g., [21], [44], [49]. In addition, the development of 
Bayesian classification techniques that address non-Gaussian 
noise is also a topic for further investigation. 


Appendix I 
Distributions 

In this Appendix, we give the expressions for all standard 
distributions that are useful to derive the conditional distribu¬ 
tions for Gibbs sampling in Appendix II. 

1) Dirichlet Distribution: Z ~ Dirichlet (c), 


p(Z) = 


(E-=iC.) 


n 


Ci-l 

Za 


(42) 


nEir(G) 

where kz denotes the length of the vector Z, and r(-) 
stands for the gamma function [41]. 

2) Circular complex Gaussian distribution: Z ~ S), 

1 


p(Z) = 


det (S) 


exp{-(Z-p)'^S-i(Z-M)} 


(43) 

where we use det(-) to denote the determinant of a 
matrix. 

3) Inverse Gamma distribution:: 2 ; ~ IQ{c, d). 


p(Z) = 4^z-=-iexp 


(44) 


r(c) 

Appendix II 

Derivations of Conditional Distributions for 
Gibbs Sampling 

In this Appendix, the required conditional distributions for 
Gibbs sampling are derived. 

A. Expression for p(pa|s, h, y) 

Applying (10) to (17), we have 


P (PA|s,h,CT^y 

\ n p( Srrit [n,k]\pA) 

«n [p^ n 

a^A n,k,mt a: Srrn[n,k]^a 

= n j/ 

aej '“I 

~Dirichlet (7 -f c), 


PAia)/\a\ 


(45) 


where c = [ci, • • • , C|_ 4 |] , and Ca is the number of samples 
of transmitted symbols in constellation a & A. 

B. Expression for p{h.,nt,mr I (Pa, s, h\hmj ,mr •) ^ J y)) 

Applying (10) to (17), we have 

P (^nit,mr (PAjS, h\h„,,„,^,cr^y)^ 

I U 2 ^ 

P^,S,h,(7 


oc exp -f — h. 


H 

rrit 


2 real 




(t Y" 

l ^mt ,mr I 


h h 

,mr- ,mr 




( 46 ) 
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where 


and 




^ = —I + 




och 









with the element of the covariance matrix of the 

vector s[n, fc] 


(47) 


(48) 


cov(s[n,fc])(„,_^,,) 

=var(sm/ [n,k]) 

r( |s„j[n,A:]|^) - ( |s„;[n, fc]|)^ 

lo> 


if m'j = m", 

m'l ^ nit] (52) 

otherwise. 


the m't-th element of (s[n, being 


where the approximation in (47) follows from the fact that ah 
is very large such that the term 1 /q;?iI can be neglected. 




(smj [n,k]), 

Smt['n,k], 


if TOj ^ rrit 
if m[ = rrit' 


(53) 


C. Expression for p (ct^Ip^s, h, y) 
Applying (10) to (17), we have 

^'pAS,h,y ’ 


p[a 

(xp 


(X 


PAS,h,y) 

(^") P (: 


y 

2 \ -“ 0-1 


PA,S,h, cr^ 

f Po 


Mr- 


n 

rrir—i 


1 


r2NK 


exp 


nir—i 

2 \ —{oLQ-\-MrNK') — \. 


Mf 




• exp 


/3o + 'J2mr=i y-m-r 'l2mt=i .Ji 


(49) 


where a = cto + NKMr and B = /3o + 

II ' 

X/m,. yiriT- ~ X/m, '^mt^mt,n 


Appendix III 

Evaluation of (29) 

By taking expectation with respect to updated distribution 
q (pa) and q (s[n, k]\Smt h, ct^) receptively, it can be 
shown that the expression for ^ Inp (s^t [n, fc]|pA) )p^ is 

(lnp(smt [’^>^]Ip^)),(pa) 

= ^ 1 {Smt [n, k] e a) [f ( 7 a) - f (70) - In |a|], (50) 

aeA 

where 70 = X)aeA 7 “’ expression for 

(lnp(y[n,fc]|s[n,fc],H[n],cr2) ) is 

( Inp (y[n, /c]|s[n, fc], H[n], a^) [n.fc].h,.=) 

(x^|2real [y"[n,fc]] (H[n])(s[n, 

— tr ((H‘^[n]H[n])cov (s[n, k])) 

- (51) 


the element of (H[n]) being the n-th element of 

the matrix product Whm',m'i and the element of 

(H^[n]H[n]) being 








-H. 


if m[ = m'l; 

if m[ f m'l- 
(54) 
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